Goto

Collaborating Authors

 enable lower discount factor


Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Neural Information Processing Systems

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.


Reviews: Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Neural Information Processing Systems

I thank the author's for their response. I would like to reemphasize that I like this paper a lot, and applaud the authors for their work. I strongly suggest an oral accept for this paper. Additionally, I've raised my score 1 point. However, I still felt that Section 3.1 was somewhat contrived, and stand by my initial criticism of this section.


Reviews: Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Neural Information Processing Systems

The authors propose remapping value functions into a logarithmic space, leading to "logarithmic Q-learning" which is demonstrated to perform quite well in practice. This paper has by far the strongest overall scores (9, 9, 8) in my paper batch. All three reviewers are enthusiastic about the paper and its contributions and results. I am recommending that NeurIPS accept the paper for Oral presentation.


Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Neural Information Processing Systems

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods.


Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Neural Information Processing Systems

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods.